-
Notifications
You must be signed in to change notification settings - Fork 13.3k
webui: remove client-side context pre-check and rely on backend for limits #16506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webui: remove client-side context pre-check and rely on backend for limits #16506
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few cosmetic changes 😄 also, could u add screenshots/video to the PR description with comparison of before/after changes? Will be great for adding context for the future lookback.
tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
Outdated
Show resolved
Hide resolved
tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
Outdated
Show resolved
Hide resolved
I’d love to make a temporary mini version of the model selector : just a simple field in Settings to declare the model in the JSON request. That way my llama-swap would work on master, and I could make videos of the master branch more easily! |
I’ve added two videos, running on my Raspberry Pi 5 (16 GB) with Qwen3 30B A3B, fully synced with the master branch. You can see the bug where I got stuck : once the context overflows, the interface is completely blocked until you hit F5. With the current PR build, it’s much better: if a message block is too large, it can still slip into the context and needs to be deleted manually. But since the backend decides, it never fully blocks. We could still improve it a bit by preventing oversized messages from being sent into the context in the first place. |
Toolcall testing (Node.js proxy) Google.what.the.weather-AVC-750kbps.mp4 |
@ServeurpersoCom Curious are you doing some OCR in the last video to detect text elements in the screenshots? Would love to learn more, but maybe after the PR is reviewed to avoid getting offtopic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ServeurpersoCom let's just rebuild fresh webui static output and we good to go :)
…imits Removed the client-side context window pre-check and now simply sends messages while keeping the dialog imports limited to core components, eliminating the maximum context alert path Simplified streaming and non-streaming chat error handling to surface a generic 'No response received from server' error whenever the backend returns no content Removed the obsolete maxContextError plumbing from the chat store so state management now focuses on the core message flow without special context-limit cases
Co-authored-by: Aleksander Grygier <[email protected]>
Co-authored-by: Aleksander Grygier <[email protected]>
…Screen.svelte Co-authored-by: Aleksander Grygier <[email protected]>
…Screen.svelte Co-authored-by: Aleksander Grygier <[email protected]>
28badc5
to
be85c24
Compare
@ServeurpersoCom actually I will improve the UI/UX of the new Alert Dialog in a separate PR so that we don't block this change :) |
Not OCR : the proxy just parses streamed text and DOM elements in real time. The model actually sees the entire page: it can analyze the full DOM and reach elements outside the viewport through an abstraction layer that simulates human actions (scroll, click, type). ![]() |
Awesome can’t wait to see your pure Svelte touch on that dialog 😄 |
Nice. So this seems like some sort of ingenious way to control a headless? browser with an LLM. And the images in the WebUI are just "progress report" from the browser. It's a bit over my head, but definitely looks interesting. |
Exactly, but not headless, full real browser with GPU capability (inside a software box) ! the goal is to convert the DOM (with all bounding boxes) into labeled text tokens for the LLM. Idea: we could add a small module in llama.cpp that exposes every ToolCall event through a user-defined HTTP hook : that would let anyone easily connect their model to external actions or systems! |
* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...
webui: remove client-side context pre-check and rely on backend for limits
Removed the client-side context window pre-check and now simply sends messages
while keeping the dialog imports limited to core components, eliminating the
maximum context alert path
Simplified streaming and non-streaming chat error handling to surface a generic
'No response received from server' error whenever the backend returns no content
Removed the obsolete maxContextError plumbing from the chat store so state
management now focuses on the core message flow without special context-limit cases
close #16437
Master branch :
https://github.com/user-attachments/assets/edc1337d-2e19-4f99-a7ba-78f40146022f
This PR (don't care about Model Selector) :
https://github.com/user-attachments/assets/e9952e04-e189-434f-8536-84184193d704